We present a framework for ranking images within their class based on the strength of spurious cues present. By measuring the gap in accuracy on the highest and lowest ranked images (we call this spurious gap), we assess spurious feature reliance for $89$ diverse ImageNet models, finding that even the best models underperform in images with weak spurious presence. However, the effect of spurious cues varies far more dramatically across classes, emphasizing the crucial, often overlooked, class-dependence of the spurious correlation problem. While most spurious features we observe are clarifying (i.e. improving test-time accuracy when present, as is typically expected), we surprisingly find many cases of confusing spurious features, where models perform better when they are absent. We then close the spurious gap by training new classification heads on lowly ranked (i.e. without common spurious cues) images, resulting in improved effective robustness to distribution shifts (ObjectNet, ImageNet-R, ImageNet-Sketch). We also propose a second metric to assess feature reliability, finding that spurious features are generally less reliable than non-spurious (core) ones, though again, spurious features can be more reliable for certain classes. To enable our analysis, we annotated $5,000$ feature-class dependencies over {\it all} of ImageNet as core or spurious using minimal human supervision. Finally, we show the feature discovery and spuriosity ranking framework can be extended to other datasets like CelebA and WaterBirds in a lightweight fashion with only linear layer training, leading to discovering a previously unknown racial bias in the Celeb-A hair classification.
translated by 谷歌翻译
现有的一些作品分别研究深神经网络的对抗或自然分布鲁棒性。但是,实际上,模型需要享受两种类型的鲁棒性,以确保可靠性。在这项工作中,我们弥合了这一差距,并表明实际上,对抗性和自然分配鲁棒性之间存在明确的权衡。我们首先考虑具有与核心和虚假功能不相交的高斯数据上的简单线性回归设置。在这种情况下,通过理论和经验分析,我们表明(i)使用$ \ ell_1 $和$ \ ell_2 $规范的对抗性培训增加了对虚假功能的模型依赖; (ii)对于$ \ ell_ \ infty $ versarial训练,仅在伪造功能的比例大于核心功能的范围时才会出现伪造的依赖; (iii)对抗训练可能会在降低分布鲁棒性方面具有意外的后果,特别是当新的测试域中更改虚假相关性时。接下来,我们使用二十个经过对抗训练的模型的测试套件提出了广泛的经验证据受过训练的对应物,验证了我们的理论结果。我们还表明,训练数据中的虚假相关性(保留在测试域中)可以改善对抗性的鲁棒性,表明先前的主张表明对抗性脆弱性植根于虚假相关性是不完整的。
translated by 谷歌翻译
Data augmentation is a valuable tool for the design of deep learning systems to overcome data limitations and stabilize the training process. Especially in the medical domain, where the collection of large-scale data sets is challenging and expensive due to limited access to patient data, relevant environments, as well as strict regulations, community-curated large-scale public datasets, pretrained models, and advanced data augmentation methods are the main factors for developing reliable systems to improve patient care. However, for the development of medical acoustic sensing systems, an emerging field of research, the community lacks large-scale publicly available data sets and pretrained models. To address the problem of limited data, we propose a conditional generative adversarial neural network-based augmentation method which is able to synthesize mel spectrograms from a learned data distribution of a source data set. In contrast to previously proposed fully convolutional models, the proposed model implements residual Squeeze and Excitation modules in the generator architecture. We show that our method outperforms all classical audio augmentation techniques and previously published generative methods in terms of generated sample quality and a performance improvement of 2.84% of Macro F1-Score for a classifier trained on the augmented data set, an enhancement of $1.14\%$ in relation to previous work. By analyzing the correlation of intermediate feature spaces, we show that the residual Squeeze and Excitation modules help the model to reduce redundancy in the latent features. Therefore, the proposed model advances the state-of-the-art in the augmentation of clinical audio data and improves the data bottleneck for the design of clinical acoustic sensing systems.
translated by 谷歌翻译
在这项工作中,我们提出了一种基于有条件的WaseStein生成对抗网络的临床音频数据集的新型数据增强方法,该网络具有梯度惩罚(CWGAN-GP),并在日志频谱图上运行。为了验证我们的方法,我们创建了一个临床音频数据集,该数据集在总髋关节置换术(THA)过程中记录在现实世界手术室中,并包含典型的声音,类似于干预的不同阶段。我们证明了所提出的方法从数据集分布中生成现实的类调节样品的能力,并表明使用生成的增强样品训练在分类精度方面优于经典音频增强方法。使用RESNET-18分类器评估了性能,该分类器在使用建议的增强方法的5倍交叉验证实验中显示了平均每类准确性提高1.70%。由于临床数据通常是昂贵的,因此实际的和高质量的数据增强方法的开发对于提高基于学习的算法的鲁棒性和概括能力至关重要,这对于安全至关重要的医学应用尤其重要。因此,提出的数据增强方法是改善基于临床音频的机器学习系统的数据瓶颈的重要一步。
translated by 谷歌翻译